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ABSTRACT 

The oral proficiency scale of the American Council on 
the Teaching of Foreign Language (ACTFL) and the Educational Testing 
Service is reviewed as an instrument for measuring the levels of 
language proficiency for college undergraduates and high school 
students* The scale is a modification of the U.S. Government's 
Foreign Service Institute Scale, and its ^.^itability for use in the 
schools has not Deen established* it is not clear to what extent the 
scale's progression reflects cognitive growth. However, the scale has 
been endorsed as part of the movement toward oral proficiency testing 
in the schools, a movement encouraged by the "Guidelines" of the 
ACTFL. It is contended here that it would be premature to discard the 
proficiency test altogether, but further research into its 
applicability for school settings is necessary. The positive effects 
that the proficiency movement has had on teaching cannot be ignored, 
but there is no necessary link between: (1) use of the oral 
proficiency test and a focus on communicative language teaching; and 
(2) meaningful language use. A 41-item list of references is 
included. (SLD) 
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PROFICIENCY TESTING AND THE SCHOOLS 



The ACTFL/ETS oral proficiency scale represents an 
adaptation of the governmental Foreign Service Institute scale. 
One of the aims in drawing up the ACTFL/ETS scale was to adapt 
the lower points of the FSI scale towards use in the academic 
environment- By making the scale more sensitive at the lower 
end it was hoped to offer a measurement of the levels of 
proficiency likely to be found among undergraduates and high 
school students. 

All of this is well known, but it is worth remarking 
that a somewhat similar procadure to that adopted by ACTFL/ETS 
had been used in the early years of the FSI scale?. At this 
stage, FSI raters were empowered to grant minus as well as plus 
marks throughout the scale. According to Jones (1979, 105) 
"The minuses were later dropped, since it wpfs found that the 
scale had become so refined that it was really not possible to 
make so many discriminations". Though discontinued at FSI, the 
practice of devising three gradations for a particular level v^as 
adopted by ACTFL/ETS. ACTFL/'" proceeded to subdivide the two 
lowest rungs of the FSI ladder, levels 0 and 1, into Low, Mid 
and High. Higgs (198^) describes the informal study carried out 
at the Educational Testing Service in 197v which provided the 
major empirical backing for the innovation. ETS staff conducted 
oral interviews with 30 high school student's. None was found to 
reach FSI level 1. Some were 0+ but most rated at 0. For Higgs, 



these findings confirmed the hypothesis that the lower end of the 
scale did not effectively discriminate and that extra 
subdivisions were necessary- Apart from this rather tenuous 
empirical base for the decision — a study of 30 high school 
students — it is not apparent that much further research was 
conducted before the new scale was issued. Certainly, Liskin- 
Gasparro's (1987) wide-rcinging account of the historv of the 
ACTFL procedure pays little attention to the question of the 
process by w'lich the scales were urawn up» Her reference to the 
creation of the new ACTFL subdivisions is just q/ob paragraph long 
(p.Sl). In this she briefly describes what she terms the 
"informal study" whxch purportedly validated the Low/Mid 
subdivisions. 

Leaving aside the shaky empirical status of the design 
of the ACTFL scale, it is questionable whether the objective of 
achieving greater sensitivity has been truly achieved by the 
changes. We now have six possible levels towards the bottom, as 
distinct from fqur in the FSI scale, hardly a dramatic increase 
in sensitivity. And one of the levc^ls. Novice Low, is defined 
in terms which describe a level of proficiency so minimal that it 
is of no interest to anybody. The new distinctions achieved a 
little higher in the ACTFL scale, such as that between 
Intermediate Low and Intermediate Mid, are unlikely to account 
for any substantial difference in the way a person's proficiency 
is viewed. It seems improbable that anyone would post an 
Intermediate Mid requirement for a job or as a measure of 
academic achieve<nent , and disqualify those who had only attained 



Intermediate-Low. In other words, it is not clear that the 
distinction is genuinely meaningful cutside the classroomc If 
this is the case, it hardly justifies tiie expansion of the lower 
end of the F5I scale and all the rnubsequent fuss. 

The top point on the ACTFL scale is also quite 
problematic for several reasons. How applicable is the scale to 
adolescent students ? How much of the scales' progression 
reflects cognitive growth ? There is abundant evidence that 
the kinds of linguistic operations called for at the high end of 
the ACTFL scale are actually cognitive and developmental ly 
decided. In the first language, we know that logical ability 
increases with age during adolescence (Byrnes and Overton 19BB) 
as do scores on verbal reasoning (Sternberg and Downing 1992). 
Similarly, scores on syllogistic reasoning tests can increase 
even through the teens (Sternberg 1979, Tallin et al. 197^), as 
does the ability to comprehend metaphorical speech (Kogan et al. 
1980). Nippold and others (Nippold 19BB, 220) found that the 
ability to understand ambiguity increases bhrough the lat3 teens. 
In other words, many of the skills nesded to achieve a score of 
Superior on the ACTFL scale are skills that are still only being 
acquired by adolescents in their first language. The Superior 
speaker is supposedly able to "participeite effectively in mosi; 
formal and informal conversations on practical, social, 
professional and abstract topics and (can) support opinions and 
hypothesize using native-like discourse strategies.** How 
many adolescents fit this description in their first language ? 
Even the ability to narrate, supposedly typical, not of 
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Superior but of Advanced levels is still in first language 
developmental flux throughout the teen-age years (Nippold 1988^ 
S53-58)- 

Ue thus need to ask how well adolescents would do on an 
DPI in their own language. Would they score at the highest 
point p Superior ? Unfortunately, there are no published 
findings on how the proficiency interview handles speakers of 
any age in their first language. The test sets out to measure 
how non-natives perform, but it has never found out how natives 
perform. We might be forgiven for presuming that a person will 
always score at the highest level in his native language, but 
this is not necessarily so. The ideal represented by Level 5 
of the FSI scale, the "educated native speaker" is one that is 
attained by few. Lowe maintains that only a minority of native 
speakers qualify for the highest ratings; "ILR experience shows 
that the Majority of native speakers of English probably fall at 
level 3" (1987, p.8>. If most native speakers score at level 3 
(Superior) we must presume that some, perhaps many, score 
below this; certainly this possibility is often raised in 
ACTFL/ETS training sessions. And if this is the case for adults, 
it must be even more so for adolescents. If the top of the 
scale is in reality inaccessible to adolescents in their native 
language, as it undoubtedly is to pre-adolescents, is this the 
kind of scale we want to use in the high school ? 

Despite the lack of evidence that the scale is validly 
applicable to the school setting, the proficiency movement has 
had a significant impact on curricula and testing at the high 
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school level in many states (Cummins 1987). Magnan (1986, 

^33) tells us that "Wisconsin has recently published a new 
curriculum guide for secondary education, based in large part on 
the ACTFL Proficiency guidelines. The guide suggests a range of 
novice-high to intermediate low for the second years of high 
school instruction" and Intermediate-low to Intermediate-High 
for 3rd and ^th year. Gutierrez (1988, 916) reports that "Many 
of the (states*) curriculum guides for foreign languages are 
couched in jargon that is taken, verbatim, from the ACTFL 
juidelines. In Virginia, for example, the document ... has 
Intermediate-High as the exit requirement for speaking at the end 
of the fourth year of high school language study." A growing 
number of foreign language texts, particularly at elementary 
level, claim to reflect a proficiency-oriented methodology. 
Indeed the scales have acquired the status of an oracle in some 
circles: Levine, Haus and Cort (1987) worry because they find 
that language teachers' judgments of their students' ability do 
not concur with those of ACTFL raters. They never raise the 
question of how canonical those very ACTFL ratings may be—- indeed 
what does it mean to make an "accurate" judgment of a person's 
proficiency. Incidentally, it is worth pointing out that 
Levine and his colleagues did NOT use the Superior level in 
their study, because they realized that none of their students 
would make this level. 

Not just the student but also others involved in 
education may be affected by the Guidelines. Dwyer and Hiple 
(1988) mention such applications of the ACTFL procedure as in 
the awarding of grants, admission to Summer Institutes or 



qualification for funding. Millman (1988), outlining an 
Alabama Commission on Higher Education grant for foreign study, 
mentions that the grant requires "that recipients have pre- and 
poBt-grant proficiency ratings as a measure of accountability". 
Hiple and Manley (1987, 153) describe how Texas is making the 
attainment of certain pruf iciency standards oh"? igatory for 
teacher certification. The State Board of Education passed a 
measure that future foreign language teachers have their oral 
proficiency assessed "using procedures, criteria, and a passing 
score in accordance with the ACTFL guidelines". In short, the 
ACTFL scale is beginning to be used in real-life decisions? of 
substantive importance to individuals. 

Though proficiency scales exist for Listening, 
Speaking, Reading and Writing, the only modality for which an 
elicitation mechanism exists is the oral (interview). ACTFL has 
come up with no standardized format for measuring listening, 
reading or writing. This in itself is something of an anomaly, 
given the stress that ACTFL has placed on the training and 
certification of oral interviewers. It serums il logical to 
place such great emphasis on controlling throt^gh certification 
those who are supposedly trained to elicit and rate oral language 
while at the same t ne having no form of control, inde^^d no 
standardized format to follow, for those who are to elicit 
ability in other modalities. 

Yet Gl isan and Ph i 1 1 ips ( 1988, 589) , describ ing a 
Department of Education-funded program for the preparation of 
FLES teachers, state how expected language improvement for those 
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who participate in the program is to be defined- The prospective 
student teachers are supposed to develop skills in Speaking y 
Listening, Reading and Writing as defined in terms of certain 
points on the ACTFL scales « 

The goals of a part of the 61 isan/Fhi 1 1 ips program are 
given as Listening I-m, Reading I~l , Writing, I-l, Speaking, 
I-l- One has to ask how are these ratings to Le made, since, 
as has been pointed out, we have no elicitation mechanism for 
anything other than Speaking. Were any participants to object to 
the score they received on any of these components, it is hard 
to see how the rating could be justified before any neutral 
party. For instance, Lee and Musumeci (1988) have shown that 
the reading levels do not exist as separate hierarchical 
entities. Elsewhere, even Phillips herself <1988, p. 138), 

admits that some students don't necessarily have to go through 
the hierarchical stages posited for reading. She says that this 
is so because reading is not a natural skill; it is learned. 
Thus a person could score at Advanced before scoring at 
Intermediate, making a nonsense of the entire scale. The 
position is no better for another of the four skills. Valdes et 
al. (1988, 421) report a study which seems to show that real-life 
learners do not follow the ACTFL progression in listening 
comprehension either. 

As a practical instrument, the ACTFL Guidelines have 
been quite successful in winning adherents, especially among 
administrators and supervisors. For some commentators, they 
have offered to provide an "organizing principle" (Higgs 1984), 



a unified way of looking at the many divergent procedures and 
methodologies employed in the foreign language classroom^ The 
agenda for conventions of foreign language teachers reflects the 
continued influence of the proficiency movement. Whatever the 
status of the scales with testing theorists and specialists, they 
continue to make the running in foreign language teaching . 
Decisions are being made on the basis of the ACTFL guidelines, 
and this trend may even accelerate in the next year or two. 
Indeed, Magnan (1988, 27^) speaks of "strong suggestions that the 
□PI serve as a national proficiency examination". 

Just a couple, of years after the publication of the 
1982 guidelines, Gascparro wrot" that "although problems still 
remain, they are logistical rather than theoretical" (198^, 
p. 39). Of course this was not the case then, and it is 

certainly not so now. Even in 1982, Frink, commenting on the 
initial efforts to adap-^ the FSI interview to the academic world, 
wrote (p. 282) "although the FSI interview remains the best 
established test of oral proficiency, it is not necessarily the 
most readily appl4.cable to high school and college students, 
even with a modified rating scale. It is based on the premise 
that the person being interviewed is an adult who will work 
abroad and assesses ability to function professionally in the 
target language. Many high school and ^ollege students are not 
yet equipped with any professional vocabulary or with the 
experience and self-assurance to perform professional-level 
language tasks". Hummel (19^^^, 1^), another early critic of 

the ACTFL procedure, believed that the guidelines "fail to 
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distinguish between general cognitive skills that are independent 
of the level of proficiency in the target language and language 
skills that are related to achievement in the target language" y 
a criticism that has still not been refuted- Stated otherwise, 
the scales reflect dualities such as Cummins' (1980) Basic 
Interpersonal Communicative Skills/ Cognitive-Academic Language 
Proficiency dichotomy, or even Bernstein's (1971) hypothesis 
of the existence of separate "elaborated" and "restricted" codes. 
The ACTFL scale associates one group of language functions and 
contexts — what ACTFL would call a level (Advanced )- --with 
cognitively undemanding, everyday uBes of language, exercised 
in highly contextual ized interpersonal situations. Another level 
(Superior) is associated with academic learning and intellectual 
discourse, and it is not accidental which level places higt.ar on 
the hierarchy. Generally, the language favored in the oral 
interview is what Spolsky in a slightly different context terms 
"the variety of academic language chosen as its ideal by the 
western literate tradition. The style is one that favors 
autonomous verbal izat ion , that idealizes the communication to 
relative strangers of the maximum amount of new knowledge using 
only verbal means" (193^, p.<f3). 

Lantolf and Frawley (1985) and Barnwell (1989) have 
attempte?d to show the specious nature of the proficiency scale's 
invocation of the native speaker. The ACTFL scale wa3 conceived 
for and is oriented to a particular milieu, the US academic 
environment, where the majority of language learners are 
Anglophone monolinguals of a certain age-group. Note that, 

by now, seven years after their initial publicatirn, ACTFL has 
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not yet translated the generic scales into even the most commonly 
tested languages such as Spanish and French. Since most people 
in France, Germany, Spain or Latin America do not know EngHsh 
they couldn*t use the ACTFL scale. Jan ACTFL predict what native 
speakers will do with the scale when the scale is inaccessible 
to the vast majority of native speakers ? A rather obvious 
requirement for validation of the claims about native speakers 
would be to provide versions which the native speakers could 
understand. 

How effective, efficient, and valid is the process of 
ACTFL training and certification in the preparation of *^hose who 
will administer ^hB oral proficiency interview ? 
Investigations with the FSI scale show that a prolonged period of 
training does net appear to be necessary for this purpose. In 
the case of the ACTFL procedure^ inter-rater reliability would 
hardly be compromised by the adoption of a less rigorous training 
process for those who are to use the test. Barnwell (1987) 
found that informally trained raters could reach a high degree of 
concordance in their ratings afier a relatively brief period of 
practice with the ACTFL scale. It may be that the cost of the 
ACTFL/ETS training program is excessive, both in terms* of time 
and money. 

More fundamental than considerations of reliability are 
considerations of validity. How valid is the training process 
devised by ACTFL ? There are grounds for suspecting that the 
training of interviewer/raters involves a process of 
socialization and group identification with an interpretation of 
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the proficiency construct which is American rather than native 
speaker in origin- A proper set of studies of the validity of 
the test would have to face the problem of how to include a wider 
cross-section of raters, including those who would not 
ordinarily volunteer to take part in psychol inguistic 
experiments. As Politzer <1978) and Vann and others (198^) 

found y factors such as age and educational background have a 
significant effect on how raters view candidates' performance. 
Other considerations such as the raters* familiarity with the 
native language of the speaker , or previous exposure to learners 
from a particular foreign language background, have a heavy 
bearing on how errors are viewed (Gass and Varonis 1984). It 
seems possible that the more language testing involves native 
speakers, with all their differing attitudes, prejudices and 
idiosyncrasies, the more problematic will be the use of any 
blanket native speaker norm. Indeed, one study found tnat a 
sample of native speakers of Spanish in Barcelona were 
consistently more severe in their judgments of American students' 
performance on the DPI than were ACTFL-trained judges (Barnwell 
1988). If natives are consistently more lenient, or more 

severe, in their judgments than are so-called testing experts, 
who are we to believe ? Are the experts wrong ? If so they are 
hardly experts. Are the natives wrong ? If so we had better 
rethink not just our tests but also our texts, indeed the 
methodologies we use and the orientation we give to our entire 
language programs. 
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Another psychometric de-ficiency in the scale is worth 
mentior^ng- The audience for pro-ficiency , or persons with whom 
the spL.aker will have to interact in the foreign language, is 
given two separate characterizations in the Guidelines. At the 
lower end, the "sympathetic" nature of the interlocutor is 
invoked. Further up, references are made to the "native 
speaker". So two different norms are used; "sympathetic 
interlocutors" are not the same as "native speakers". But, 
however unamenable to definition the phrase "native speaker" may 
be, the "sympathetic interlocutor" is ven more nebulous. Surely 
what we are really seeing in the term "sympathetic interlocutor" 
is no more than a circumlocution for "classroom teacher", since 
one can think of few other non-native interlocutors likely to be 
encountered by sneakers at the lower levels. This certainly is 
the impression one receives in reading Galloway's (65-69) 
treatment of the topic. Though she offers several pages on 
interlocutor characteristics, she restricts herself to 
considering the roles of teacher or interviewer. She fails to 
address the psychome^tric problems involved in defining the 
"sympathetic interlocutor". If "sympathetic interlocutors" are 
really just classroom teachers, then the scale should say so. 

Several observers (Bart 19B6, Kramsch 1986) believe 
that a stress on oral proficiency inevitably leads to a neglect 
of the many other objectives of foreign language learning, those 
values which traditionally provided the rationale for the place 
of foreign languages in the curriculum. Since it is a fact that 
rnly a small proportion of our students, be they high school or 
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college, will ever have much occasion to exercise their oral 
proficiency, should we be willing to define the goals of years 
of activity and study in terms of the needs of a minority ? If 
proficiency is communicative success, can we not look for a 
richer definition of communication^ one that encompasses such 
things as communication with other peoples' pasts and present, 
the ability to derivt pleasure and benefit from the great 
achievements in a foreign language and culture ? 

There are millions of persons in this country who 
possess a command of a second language far in advance of anything 
we might impart to monolinguals in school or college. These 
are the bilingual speakers, and they represent a vast resource 
which has as yet been but little tapped. At a time when the 

education systems in several states are being swamped with 
children of non-English speaking background, there is a special 
need for some uniform, widely— accepted , and validated metric 
•for the assessment of these children's skills in both languages. 
Equally, given the demand for bilingual teachers and social 
workers, for exanple, there is an urgent need to establish some 
means of gauging the extent of individuals' proficiency in the 
languages they claim to knov.*. These speakers present a 

particular profile, obviously distinct from that of the 
Anglophone students of foreign languages. In the case of U.S. 
Hispanics, for instance, only a small proportion have received 
all their education in Spanish. Hence their opportunities to 
acquire and practice the academic and intellectual register 
required for Superior level have been restricted. Though they 
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may speak Spanish in diverse situations at home and at work, and 
function perfectly wel 1 in the language, they will not 
always attain the rating of Superior in their own language. 

It might be countered, in objection to some of the 
points raised here, that we are ignoring the positive effects 
that the proficiency movement has had in teaching practices and 
texts. However, there is no necessary link between the ACTFL 
proficiency test and the focus on communicative language 
teaching, meaningful language use, and authentic materials that 
characterizes some of our best classes at any level. Indeed, 
parallel practices to these can be found in foreign language 
teaching in Europe today, even though the proficiency movement 
is almost unknown there. Language teaching in the U.S. in the 
1970s was already evolving in the direction that the proficiency 
movement ha^ sought to claim as its own. Witness the debate 
about "communicative competence" that predates the publication of 
the ACTFL Guidelines. This trend would surely have been 

maintained throughout the 19SOs, regardless of whether ftCTFL had 
ever published its Guidelines. 

It would be premature to endorse the view articulated 
by Lantolf and Frawley (1988), that the proficiency test is so 
fundamentally flawed as to call for a moratorium on its use. 
Surely we should not quickly discard the FSI interview and the 
thirty-year tradition it embodies. But the research has to come 
first, and inflated claims must be refuted. Had the ACTFL 

procedure been a drug or domestic appliance it would long ago 
have been withdrawn from the market, since its proponents have 



supplied no proof that it does what it claims to do. It is all 
development y and no research. In the foreign language teaching 
profession practice has often lagged behind theory. In the case 
of "proficiency", however, perhaps it has been the ov^er way 
around. It's time to slow down, reflect a while, and give the 
theory time to catch up. 
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